The Book Structure Extraction Competition with the Resurgence software for part and chapter detection at Caen University
Identifieur interne : 000328 ( Main/Exploration ); précédent : 000327; suivant : 000329The Book Structure Extraction Competition with the Resurgence software for part and chapter detection at Caen University
Auteurs : Emmanuel Giguet [France] ; Nadine Lucas [France]Source :
Abstract
The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.
Url:
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 000122
- to stream Hal, to step Curation: 000122
- to stream Hal, to step Checkpoint: 000076
- to stream Main, to step Merge: 000332
- to stream Main, to step Curation: 000328
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">The Book Structure Extraction Competition with the Resurgence software for part and chapter detection at Caen University</title>
<author><name sortKey="Giguet, Emmanuel" sort="Giguet, Emmanuel" uniqKey="Giguet E" first="Emmanuel" last="Giguet">Emmanuel Giguet</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID"><orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-150" type="direct"><org type="laboratory" xml:id="struct-150" status="VALID"><orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc><address><addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation><relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect"><org type="institution" xml:id="struct-300358" status="VALID"><orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect"><org type="institution" xml:id="struct-300266" status="INCOMING"><orgName>Université de Caen Basse-Normandie</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
<author><name sortKey="Lucas, Nadine" sort="Lucas, Nadine" uniqKey="Lucas N" first="Nadine" last="Lucas">Nadine Lucas</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID"><orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-150" type="direct"><org type="laboratory" xml:id="struct-150" status="VALID"><orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc><address><addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation><relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect"><org type="institution" xml:id="struct-300358" status="VALID"><orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect"><org type="institution" xml:id="struct-300266" status="INCOMING"><orgName>Université de Caen Basse-Normandie</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01069909</idno>
<idno type="halId">hal-01069909</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01069909</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01069909</idno>
<date when="2011-12-12">2011-12-12</date>
<idno type="wicri:Area/Hal/Corpus">000122</idno>
<idno type="wicri:Area/Hal/Curation">000122</idno>
<idno type="wicri:Area/Hal/Checkpoint">000076</idno>
<idno type="wicri:Area/Main/Merge">000332</idno>
<idno type="wicri:Area/Main/Curation">000328</idno>
<idno type="wicri:Area/Main/Exploration">000328</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">The Book Structure Extraction Competition with the Resurgence software for part and chapter detection at Caen University</title>
<author><name sortKey="Giguet, Emmanuel" sort="Giguet, Emmanuel" uniqKey="Giguet E" first="Emmanuel" last="Giguet">Emmanuel Giguet</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID"><orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-150" type="direct"><org type="laboratory" xml:id="struct-150" status="VALID"><orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc><address><addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation><relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect"><org type="institution" xml:id="struct-300358" status="VALID"><orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect"><org type="institution" xml:id="struct-300266" status="INCOMING"><orgName>Université de Caen Basse-Normandie</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
<author><name sortKey="Lucas, Nadine" sort="Lucas, Nadine" uniqKey="Lucas N" first="Nadine" last="Lucas">Nadine Lucas</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-388300" status="VALID"><orgName>Equipe Hultech - Laboratoire GREYC - UMR6072</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-150" type="direct"></relation>
<relation name="UMR6072" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300358" type="indirect"></relation>
<relation active="#struct-300266" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-150" type="direct"><org type="laboratory" xml:id="struct-150" status="VALID"><orgName>Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen</orgName>
<orgName type="acronym">GREYC</orgName>
<desc><address><addrLine>Boulevard du Maréchal Juin - 14050 CAEN Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.greyc.fr</ref>
</desc>
<listRelation><relation name="UMR6072" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300358" type="direct"></relation>
<relation active="#struct-300266" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR6072" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300358" type="indirect"><org type="institution" xml:id="struct-300358" status="VALID"><orgName>Ecole Nationale Supérieure d'Ingénieurs de Caen</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300266" type="indirect"><org type="institution" xml:id="struct-300266" status="INCOMING"><orgName>Université de Caen Basse-Normandie</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Caen</settlement>
<region type="region" nuts="2">Basse-Normandie</region>
</placeName>
<orgName type="university">Université de Caen Basse-Normandie</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Basse-Normandie</li>
</region>
<settlement><li>Caen</li>
</settlement>
<orgName><li>Université de Caen Basse-Normandie</li>
</orgName>
</list>
<tree><country name="France"><region name="Basse-Normandie"><name sortKey="Giguet, Emmanuel" sort="Giguet, Emmanuel" uniqKey="Giguet E" first="Emmanuel" last="Giguet">Emmanuel Giguet</name>
</region>
<name sortKey="Lucas, Nadine" sort="Lucas, Nadine" uniqKey="Lucas N" first="Nadine" last="Lucas">Nadine Lucas</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000328 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000328 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Hal:hal-01069909 |texte= The Book Structure Extraction Competition with the Resurgence software for part and chapter detection at Caen University }}
This area was generated with Dilib version V0.6.32. |